Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refact: nydusd's state machine #439

Merged
merged 1 commit into from
May 31, 2022
Merged

Conversation

ccx1024cc
Copy link
Contributor

QUESTION

UPGRADING and INTERRUPTED are designed for hot-upgrading, which is a use case instead of daemon's state. As a result, they are weird in lack of upgrading function.

DESIGN

Merge UPGRADING and INTERRUPTED to READY. READY means service is well-configured, but waiting for trigger. For fuse-based daemon, it means fuse device is mounted, while fuse mountpoint may be not attached. The daemon lives in INIT -> READY -> RUNNING -> READY -> DIE.

CHANGE

  1. Daemon context with new status.
  2. Async TerminateFuseService, Umount actions.
  3. Start endpoint, which is usually used after takeover endpoint.
  4. Async exit after fuse device unmounted abnormally.

kicker: &dyn Fn(ApiRequest) -> ApiResponse,
) -> HttpResult {
match (req.method(), req.body.as_ref()) {
(Method::Put, None) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PUT method in RESTful API design means that the operation can be executed multiple times and always have an idempotent state, is this what the API expects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PUT method in RESTful API design means that the operation can be executed multiple times and always have an idempotent state, is this what the API expects?

It's expected that daemon in READY is trigger to RUNNING. Just like takeover endpoint with PUT method, it's required for daemon in certain status.

}

if s == DaemonState::RUNNING {
self.on_event(DaemonStateMachineInput::Stop)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not merge the two events handling if blocks.
They both react on Stop event

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not merge the two events handling if blocks. They both react on Stop event

The INIT can transfer to DIE by stop directly. While the RUNNING is transferred to DIE via READY. For INIT, there is only one stop event, instead of two stop events for RUNNING.

@changweige
Copy link
Contributor

changweige commented May 23, 2022

Please check the patch's changelog, it is somewhat a mess of duplicated messages. Can you squash the SOB lines?
I suggest adding this PR's description into the commit message.

@ccx1024cc
Copy link
Contributor Author

Please check the patch's changelog, it is somewhat a mess of duplicated messages. Can you squash the SOB lines? I suggest adding this PR's description into the commit message.

Good idea! 3q

@ccx1024cc ccx1024cc force-pushed the upmaster branch 2 times, most recently from 0cbb1eb to 774c6c0 Compare May 23, 2022 02:29
@@ -289,6 +291,7 @@ lazy_static! {
r.routes.insert(endpoint_v1!("/daemon/events"), Box::new(EventsHandler{}));
r.routes.insert(endpoint_v1!("/daemon/backend"), Box::new(FsBackendInfo{}));
r.routes.insert(endpoint_v1!("/daemon/exit"), Box::new(ExitHandler{}));
r.routes.insert(endpoint_v1!("/daemon/start"), Box::new(StartHandler{}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we relocate the API path to /daemon/fuse/start since the handler of endpoint /daemon/start just starts fuse server's service loop. For nydusd itself, it is already started, otherwise how it can handle API requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we relocate the API path to /daemon/fuse/start since the handler of endpoint /daemon/start just starts fuse server's service loop. For nydusd itself, it is already started, otherwise how it can handle API requests?

The start endpoint works with status Ready and RUNNING, which are designed for only fuse but also for other implements. Maybe run is more comfortable?

src/bin/nydusd/fusedev.rs Outdated Show resolved Hide resolved
Mount => Running [StartService],
Takeover => Upgrading [Restore],
Exit => Die[StopStateMachine],
Mount => Ready,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we no longer need Mount event since nothing is done when migrating state to Ready

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we no longer need Mount event since nothing is done when migrating state to Ready

In fact, we shall do Mount action. However, it's a bit risky now. Because the Mount content are different in different cases, such as upgrade startup, normal startup, endpoint startup. In total, You're right. There will be abstraction for Mount action in future.

.map_err(|e| {
DaemonError::WaitDaemon(
*e.downcast::<Error>()
.unwrap_or_else(|e| Box::new(eother!(e))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um. Hard error conversion path to io::Error, maybe let WaitDaemon wrap another type of error message is an easier way. But has nothing to do with this PR.

src/bin/nydusd/fusedev.rs Outdated Show resolved Hide resolved
@@ -477,12 +508,13 @@ pub fn create_nydus_daemon(
inflight_ops: Mutex::new(Vec::new()),
result_receiver: Mutex::new(result_receiver),
trigger: Arc::new(Mutex::new(trigger)),
threads: Mutex::new(Vec::new()),
state_machine_thread: Mutex::new(None),
fuse_service_threads: Mutex::new(Vec::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is reaping fuse threads important?
Only wait for state machine termination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is reaping fuse threads important? Only wait for state machine termination?

Yep, seeking for waiting state machine and fuse service thread termination separately. Because daemon status is only updated after successful action. For example, interrupt action is executed, then wait fuse service workers done, finally the daemon status is changed from RUNNING to READY.

Copy link
Member

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

`UPGRADING` and `INTERRUPTED` are designed for hot-upgrading, which is a use case instead of daemon's state. As a result, they are weird in lack of upgrading function.

Merge `UPGRADING` and `INTERRUPTED` to `READY`. `READY` means service is well-configured, but waiting for trigger. For fuse-based daemon, it means fuse device is mounted, while fuse mountpoint may be not attached. The daemon lives in `INIT` -> `READY` -> `RUNNING` -> `READY` -> `DIE`.

1. Daemon context with new status.
2. Async `TerminateFuseService`, `Umount` actions.
3. `Start` endpoint, which is usually used after `takeover` endpoint.
4. Async exit after fuse device unmounted abnormally.

Signed-off-by: 泰友 <cuichengxu.ccx@antgroup.com>
@imeoer imeoer merged commit 6be13a7 into dragonflyoss:master May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants